62 research outputs found

    BioSStore: A Client Interface for a Repository of Semantically Annotated Bioinformatics Web Services

    Get PDF
    Bioinformatics has shown itself to be a domain in which Web services are being used extensively. In this domain, simple but real services are being developed. Thus, there are huge repositories of real services available (for example BioMOBY main repository includes more than 1500 services). Besides, bioinformatics repositories usually have active communities using and working on improvements. However, these kinds of repositories do not exploit the full potential of Web services (and SOA, Service Oriented Applications, in general). On the other hand, sophisticated technologies have been proposed to improve SOA, including the annotation on Web services to explicitly describe them. However, these approaches are lacking in repositories with real services. In the work presented here, we address the drawbacks present in bioinformatics services and try to improve the current semantic model by introducing the use of the W3C standard Semantic Annotations for WSDL and XML Schema (SAWSDL) and related proposals (WSMO Lite). This paper focuses on a user interface that takes advantage of a repository of semantically annotated bioinformatics Web services. In this way, we exploit semantics for the discovery of Web services, showing how the use of semantics will improve the user searches. The BioSStore is available at http://biosstore.khaos.uma.es. This portal will contain also future developments of this proposal

    Biopax and Semantics

    Get PDF
    Biopax community is producing sets of data in RDF files, but most of them are not available through query interfaces. The publication of SPARQL endpoints is feasible with current sets of data, but the use of reasoning in these interfaces is unfeasible in many cases. The use of large scale reasoners is a need to take advantage of these data sets

    A Service for Flexible Management and Analysis of Heterogeneous Clinical Data

    Get PDF
    Este documento describe FIMED 2.0, un servicio para la gestión flexible y análisis de datos clínicos heterogéneos. Esta herramienta de software permite la gestión flexible de datos clínicos de múltiples ensayos, lo que puede ayudar a mejorar la calidad de los datos clínicos y facilitar los ensayos clínicos. El servicio propuesto se ha desarrollado sobre una base de datos NoSQL (MongoDB) que permite recoger e integrar los datos clínicos en esquemas dinámicos e incrementales en función de sus necesidades y de los requisitos de la investigación clínica. requisitos de la investigación clínica. Basándonos en nuestras experiencias con la Gestión Flexible de Datos Biomédicos (FIMED), hemos desarrollado esta nueva versión de la herramienta con el objetivo no sólo de replicar la anterior, sino también de incluir más análisis de redes reguladoras de genes y visualización de datos orientados a anotar la funcionalidad de los genes e identificar los genes centrales. Esta versión permite Esta versión permite al profesional utilizar cuatro métodos diferentes de construcción de redes, como como la asimilación de datos, la interpolación lineal, el conjunto basado en árboles o la regresión Boosting Machine. Puede encontrar una versión gratuita de esta herramienta en la web https://khaos.uma.es/fimedV2. Se ha creado una cuenta de usuario de demostración para proporcionar una demostración de usuario, "iwbbio", utilizando la contraseña "demo". Un caso de uso real para un ensayo clínico en la enfermedad del melanoma también se incluye en esta demostración, que sí ha sido anonimizada.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec

    Bioqueries: a collaborative environment to create, explore and share SPARQL queries in Life Sciences

    Get PDF
    Bioqueries provides a collaborative environment to create, explore, execute, clone and share SPARQL queries (including Federated Queries). Federated SPARQL queries can retrieve information from more than one data source.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Melanoma expression analysis with Big Data technologies

    Get PDF
    Melanoma is a highly immunogenic tumor. Therefore, in recent years physicians have incorporated drugs that alter the immune system into their therapeutic arsenal against this disease, revolutionizing in the treatment of patients in an advanced stage of the disease. This has led us to explore and deepen our knowledge of the immunology surrounding melanoma, in order to optimize its approach. At present, immunotherapy for metastatic melanoma is based on stimulating an individual’s own immune system through the use of specific monoclonal antibodies. The use of immunotherapy has meant that many of patients with melanoma have survived and therefore it constitutes a present and future treatment in this field. At the same time, drugs have been developed targeting specific mutations, specifically BRAF, resulting in large responses in tumor regression (set up in this clinical study to 18 months), as well as a higher percentage of long-term survivors. The analysis of the gene expression changes and their correlation with clinical changes can be developed using the tools provided by those companies which currently provide gene expression platforms. The gene expression platform used in this clinical study is NanoString, which provides nCounter. However, nCounter has some limitations as the type of analysis is restricted to a predefined set, and the introduction of clinical features is a complex task. This paper presents an approach to collect the clinical information using a structured database and a Web user interface to introduce this information, including the results of the gene expression measurements, to go a step further than the nCounter tool. As part of this work, we present an initial analysis of changes in the gene expression of a set of patients before and after targeted therapy. This analysis has been carried out using Big Data technologies (Apache Spark) with the final goal being to scale up to large numbers of patients, even though this initial study has a limited number of enrolled patients (12 in the first analysis). This is not a Big Data problem, but the underlaying study aims at targeting 20 patients per year just in Málaga, and this could be extended to be used to analyze the 3.600 patients diagnosed with melanoma per year.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This work was funded in part by Grants TIN2014-58304-R (Ministerio de Ciencia e Innovación) and P11-TIC-7529 and P12-TIC-1519 (Plan Andaluz de Investigación, Desarrollo e Innovación). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    NORA: Scalable OWL reasoner based on NoSQL databasesand Apache Spark

    Get PDF
    Reasoning is the process of inferring new knowledge and identifying inconsis-tencies within ontologies. Traditional techniques often prove inadequate whenreasoning over large Knowledge Bases containing millions or billions of facts.This article introduces NORA, a persistent and scalable OWL reasoner built ontop of Apache Spark, designed to address the challenges of reasoning over exten-sive and complex ontologies. NORA exploits the scalability of NoSQL databasesto effectively apply inference rules to Big Data ontologies with large ABoxes. Tofacilitatescalablereasoning,OWLdata,includingclassandpropertyhierarchiesand instances, are materialized in the Apache Cassandra database. Spark pro-grams are then evaluated iteratively, uncovering new implicit knowledge fromthe dataset and leading to enhanced performance and more efficient reasoningover large-scale ontologies. NORA has undergone a thorough evaluation withdifferent benchmarking ontologies of varying sizes to assess the scalability of thedeveloped solution.Funding for open access charge: Universidad de Málaga / CBUA This work has been partially funded by grant (funded by MCIN/AEI/10.13039/501100011033/) PID2020-112540RB-C41,AETHER-UMA (A smart data holistic approach for context-aware data analytics: semantics and context exploita-tion). Antonio Benítez-Hidalgo is supported by Grant PRE2018-084280 (Spanish Ministry of Science, Innovation andUniversities)

    KNIT: Ontology reusability through knowledge graph exploration

    Get PDF
    Ontologies have become a standard for knowledge representation across several domains. In Life Sciences, numerous ontologies have been introduced to represent human knowledge, often providing overlapping or conflicting perspectives. These ontologies are usually published as OWL or OBO, and are often registered in open repositories, e.g., BioPortal. However, the task of finding the concepts (classes and their properties) defined in the existing ontologies and the relationships between these concepts across different ontologies – for example, for developing a new ontology aligned with the existing ones – requires a great deal of manual effort in searching through the public repositories for candidate ontologies and their entities. In this work, we develop a new tool, KNIT, to automatically explore open repositories to help users fetch the previously designed concepts using keywords. User-specified keywords are then used to retrieve matching names of classes or properties. KNIT then creates a draft knowledge graph populated with the concepts and relationships retrieved from the existing ontologies. Furthermore, following the process of ontology learning, our tool refines this first draft of an ontology. We present three BioPortal-specific use cases for our tool. These use cases outline the development of new knowledge graphs and ontologies in the sub-domains of biology: genes and diseases, virome and drugs.This work has been funded by grant PID2020-112540RB-C4121, AETHER-UMA (A smart data holistic approach for context-aware data analytics: semantics and context exploitation). Funding for open access charge: Universidad de Málaga / CBUA

    A Fine Grain Sentiment Analysis with Semantics in Tweets

    Get PDF
    Social networking is nowadays a major source of new information in the world. Microblogging sites like Twitter have millions of active users (320 million active users on Twitter on the 30th September 2015) who share their opinions in real time, generating huge amounts of data. These data are, in most cases, available to any network user. The opinions of Twitter users have become something that companies and other organisations study to see whether or not their users like the products or services they offer. One way to assess opinions on Twitter is classifying the sentiment of the tweets as positive or negative. However, this process is usually done at a coarse grain level and the tweets are classified as positive or negative. However, tweets can be partially positive and negative at the same time, referring to different entities. As a result, general approaches usually classify these tweets as “neutral”. In this paper, we propose a semantic analysis of tweets, using Natural Language Processing to classify the sentiment with regards to the entities mentioned in each tweet. We offer a combination of Big Data tools (under the Apache Hadoop framework) and sentiment analysis using RDF graphs supporting the study of the tweet’s lexicon. This work has been empirically validated using a sporting event, the 2014 Phillips 66 Big 12 Men’s Basketball Championship. The experimental results show a clear correlation between the predicted sentiments with specific events during the championship

    Ensemble-based genetic algorithm explainer with automized image segmentation: A case study on melanoma detection dataset

    Get PDF
    Explainable Artificial Intelligence (XAI) makes AI understandable to the human user particularly when the model is complex and opaque. Local Interpretable Model-agnostic Explanations (LIME) has an image explainer package that is used to explain deep learning models. The image explainer of LIME needs some parameters to be manually tuned by the expert in advance, including the number of top features to be seen and the number of superpixels in the segmented input image. This parameter tuning is a time-consuming task. Hence, with the aim of developing an image explainer that automizes image segmentation, this paper proposes Ensemblebased Genetic Algorithm Explainer (EGAE) for melanoma cancer detection that automatically detects and presents the informative sections of the image to the user. EGAE has three phases. First, the sparsity of chromosomes in GAs is determined heuristically. Then, multiple GAs are executed consecutively. However, the difference between these GAs are in different number of superpixels in the input image that result in different chromosome lengths. Finally, the results of GAs are ensembled using consensus and majority votings. This paper also introduces how Euclidean distance can be used to calculate the distance between the actual explanation (delineated by experts) and the calculated explanation (computed by the explainer) for accuracy measurement. Experimental results on a melanoma dataset show that EGAE automatically detects informative lesions, and it also improves the accuracy of explanation in comparison with LIME efficiently. The python codes for EGAE, the ground truths delineated by clinicians, and the melanoma detection dataset are available at https://github.com/KhaosResearch/EGAEThis work has been partially funded by grant PID2020-112540RBC41 (funded by MCIN/AEI/10.13039/501100011033/, Spain), AETHERUMA, Spain (A smart data holistic approach for context-aware data analytics: semantics and context exploitation). Funding for open access charge: Universidad de Málaga/CBUA. Additionally, we thank Dr. Miguel Ángel Berciano Guerrero from Unidad de Oncología Intercentros, Hospitales Univesitarios Regional Virgen de la Victoria de Málaga, and Instituto de Investigaciones Biomédicas (IBIMA), Málaga, Spain, for his support in images selection and general medical orientation in the particular case of Melanoma

    GENECI: A novel evolutionary machine learning consensus-based approach for the inference of gene regulatory networks

    Get PDF
    Gene regulatory networks define the interactions between DNA products and other substances in cells. Increasing knowledge of these networks improves the level of detail with which the processes that trigger different diseases are described and fosters the development of new therapeutic targets. These networks are usually represented by graphs, and the primary sources for their correct construction are usually time series from differential expression data. The inference of networks from this data type has been approached differently in the literature. Mostly, computational learning techniques have been implemented, which have finally shown some specialization in specific datasets. For this reason, the need arises to create new and more robust strategies for reaching a consensus based on previous results to gain a particular capacity for generalization. This paper presents GENECI (GEne NEtwork Consensus Inference), an evolutionary machine learning approach that acts as an organizer for constructing ensembles to process the results of the main inference techniques reported in the literature and to optimize the consensus network derived from them, according to their confidence levels and topological characteristics. After its design, the proposal was confronted with datasets collected from academic benchmarks (DREAM challenges and IRMA network) to quantify its accuracy. Subsequently, it was applied to a real-world biological network of melanoma patients whose results could be contrasted with medical research collected in the literature. Finally, it has been proved that its ability to optimize the consensus of several networks leads to outstanding robustness and accuracy, gaining a certain generalization capacity after facing the inference of multiple datasetsThis work has been partially funded by grant (funded by MCIN/AEI/10.13039/501100011033/) PID2020-112540RB-C41, AETHER-UMA, Spain (A smart data holistic approach for context-aware data analytics: semantics and context exploitation) and Andalusian PAIDI program, Spain with grant P18-RT-2799. Funding for open access charge: Universidad de Málaga, Spain/CBUA. Adrián SeguraOrtiz is supported by Grant FPU21/03837 (Spanish Ministry of Science, Innovation and Universities, Spain
    corecore